Hierarchical cluster analysis of SAGE data for cancer profiling

نویسندگان

Raymond T. Ng

Jörg Sander

Monica C. Sleumer

چکیده

In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the subcellular level. The data, however, is extremely high dimensional, and due to the method of measurement, there are many errors as well as missing values in the data, challenging any clustering algorithm. Therefore, we introduce special pre-processing techniques to reduce these errors and to restore missing data. These techniques are tailored to the process that generates the data, making only very conservative changes. Furthermore, we present a new subspace selection technique to identify a relevant subset of attributes (genes) using the Wilcoxon test. This is a general technique that can be applied to select subspaces for the purpose of clustering whenever some high-level categories of interest are known for the data (such as cancerous and noncancerous). Finally, we discuss the results of the application of the clustering algorithm OPTICS to the SAGE data, before and after our preprocessing steps.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.

The purpose of this work is to promote and facilitate forensic profiling and chemical analysis of illicit drug samples in order to determine their origin, methods of production and transfer through the country. The article is based on the gas chromatography analysis of heroin samples seized from three different locations in Serbia. Chemometric approach with appropriate statistical tools (multip...

متن کامل

Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.

متن کامل

Efficient Agglomerative clustering Method for Micro Array Data on Breast Cancer Outcome

Analysis of micro arrays presents a number of unique challenges for data mining. The main types of data analysis needed for biomedical applications includeclusteringfinding new biological classes or refining an existing one. We compare the various experimental clustering results of S+ from Insightful, XCluster at Stanford, Eisen’s Cluster, and Rousseau & Kaufman’s Web clusters for single linkag...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A Seriation Approach for Visualization-Driven Discovery of Co-Expression Patterns in Serial Analysis of Gene Expression (SAGE) Data

BACKGROUND Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. PRINCIPAL FINDINGS Here we...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Hierarchical cluster analysis of SAGE data for cancer profiling

نویسندگان

چکیده

منابع مشابه

Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.

Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.

Efficient Agglomerative clustering Method for Micro Array Data on Breast Cancer Outcome

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

A Seriation Approach for Visualization-Driven Discovery of Co-Expression Patterns in Serial Analysis of Gene Expression (SAGE) Data

عنوان ژورنال:

اشتراک گذاری